在各种Web应用程序(例如数字广告和电子商务)中使用多模式数据的兴趣越来越大。从多模式数据中提取重要信息的典型方法取决于结合了来自多个编码器的特征表示的中型架构。但是,随着模态数量的增加,中融合模型结构的几个潜在问题会出现,例如串联多模式特征和缺失模态的维度增加。为了解决这些问题,我们提出了一个新概念,该概念将多模式输入视为一组序列,即深度多模式序列集(DM $^2 $ S $^2 $)。我们的设置感知概念由三个组成部分组成,这些组件捕获了多种模式之间的关系:(a)基于BERT的编码器来处理序列中元素间和内级内和内级的编码器,(b)模式内的残留物(Intramra)(Intramra) )捕获元素在模态中的重要性,以及(c)模式间残留的关注(Intermra),以进一步增强具有模态水平粒度的元素的重要性。我们的概念表现出与以前的设置感知模型相当或更好的性能。此外,我们证明了学识渊博的Intermra和Intramra权重的可视化可以提供对预测结果的解释。
translated by 谷歌翻译
通常很难从网上交换的文本中正确推断作家的情绪,而作家和读者之间的认可差异可能会出现问题。在本文中,我们提出了一个新的框架,用于检测句子,以在作者和读者之间在情感识别上产生差异,并检测引起这种差异的表达方式。所提出的框架由基于变压器(BERT)的检测器的双向编码器表示,该表示器检测句子,导致情绪识别差异,并分析获得在此类句子中特征性出现的表达式。该探测器基于由作者和社交网络服务(SNS)文档的三个读者注释的日本SNS文档数据集,并以AUC = 0.772检测到“隐藏的天角句子”;这些句子引起了人们对愤怒的认识的差异。由于SNS文档包含许多句子,这些句子的含义很难通过分析该检测器检测到的句子来解释,因此我们获得了几种表达式,这些表达式在隐藏的角度句子中出现。被发现的句子和表情并不能明确传达愤怒,很难推断作家的愤怒,但是如果指出了隐性的愤怒,就有可能猜测作者为什么生气。在实际使用中,该框架很可能有能力根据误解来缓解问题。
translated by 谷歌翻译
Although attention mechanisms have become fundamental components of deep learning models, they are vulnerable to perturbations, which may degrade the prediction performance and model interpretability. Adversarial training (AT) for attention mechanisms has successfully reduced such drawbacks by considering adversarial perturbations. However, this technique requires label information, and thus, its use is limited to supervised settings. In this study, we explore the concept of incorporating virtual AT (VAT) into the attention mechanisms, by which adversarial perturbations can be computed even from unlabeled data. To realize this approach, we propose two general training techniques, namely VAT for attention mechanisms (Attention VAT) and "interpretable" VAT for attention mechanisms (Attention iVAT), which extend AT for attention mechanisms to a semi-supervised setting. In particular, Attention iVAT focuses on the differences in attention; thus, it can efficiently learn clearer attention and improve model interpretability, even with unlabeled data. Empirical experiments based on six public datasets revealed that our techniques provide better prediction performance than conventional AT-based as well as VAT-based techniques, and stronger agreement with evidence that is provided by humans in detecting important words in sentences. Moreover, our proposal offers these advantages without needing to add the careful selection of unlabeled data. That is, even if the model using our VAT-based technique is trained on unlabeled data from a source other than the target task, both the prediction performance and model interpretability can be improved.
translated by 谷歌翻译
Edema is a common symptom of kidney disease, and quantitative measurement of edema is desired. This paper presents a method to estimate the degree of edema from facial images taken before and after dialysis of renal failure patients. As tasks to estimate the degree of edema, we perform pre- and post-dialysis classification and body weight prediction. We develop a multi-patient pre-training framework for acquiring knowledge of edema and transfer the pre-trained model to a model for each patient. For effective pre-training, we propose a novel contrastive representation learning, called weight-aware supervised momentum contrast (WeightSupMoCo). WeightSupMoCo aims to make feature representations of facial images closer in similarity of patient weight when the pre- and post-dialysis labels are the same. Experimental results show that our pre-training approach improves the accuracy of pre- and post-dialysis classification by 15.1% and reduces the mean absolute error of weight prediction by 0.243 kg compared with training from scratch. The proposed method accurately estimate the degree of edema from facial images; our edema estimation system could thus be beneficial to dialysis patients.
translated by 谷歌翻译
Peripheral blood oxygen saturation (SpO2), an indicator of oxygen levels in the blood, is one of the most important physiological parameters. Although SpO2 is usually measured using a pulse oximeter, non-contact SpO2 estimation methods from facial or hand videos have been attracting attention in recent years. In this paper, we propose an SpO2 estimation method from facial videos based on convolutional neural networks (CNN). Our method constructs CNN models that consider the direct current (DC) and alternating current (AC) components extracted from the RGB signals of facial videos, which are important in the principle of SpO2 estimation. Specifically, we extract the DC and AC components from the spatio-temporal map using filtering processes and train CNN models to predict SpO2 from these components. We also propose an end-to-end model that predicts SpO2 directly from the spatio-temporal map by extracting the DC and AC components via convolutional layers. Experiments using facial videos and SpO2 data from 50 subjects demonstrate that the proposed method achieves a better estimation performance than current state-of-the-art SpO2 estimation methods.
translated by 谷歌翻译
深度神经网络(DNN)众所周知,很容易受到对抗例子的影响(AES)。此外,AE具有对抗性可传递性,这意味着为源模型生成的AE可以以非平凡的概率欺骗另一个黑框模型(目标模型)。在本文中,我们首次研究了包括Convmixer在内的模型之间的对抗性转移性的属性。为了客观地验证可转让性的属性,使用称为AutoAttack的基准攻击方法评估模型的鲁棒性。在图像分类实验中,Convmixer被确认对对抗性转移性较弱。
translated by 谷歌翻译
在本文中,我们提出了一种攻击方法,以阻止炒面的面部图像,尤其是加密 - 加压(ETC)应用图像,通过首次利用现有强大的stylegan编码器和解码器。我们专注于恢复可以从加密图像中揭示可识别信息的样式,而不是从加密图像中重建相同的图像。所提出的方法通过使用特定的训练策略使用普通和加密的图像对来训练编码器。尽管最新的攻击方法无法从ETC图像中恢复任何感知信息,但该建议的方法披露了个人身份信息,例如头发颜色,肤色,眼镜,性别等。结果表明,与普通图像相比,重建的图像具有一些感知的相似性。
translated by 谷歌翻译
深度神经网络(DNN)众所周知,很容易受到对抗例子的影响(AES)。此外,AE具有对抗性转移性,即为源模型傻瓜(目标)模型生成的AE。在本文中,我们首次研究了为对抗性强大防御的模型的可传递性。为了客观地验证可转让性的属性,使用称为AutoAttack的基准攻击方法评估模型的鲁棒性。在图像分类实验中,使用加密模型的使用不仅是对AE的鲁棒性,而且还可以减少AES在模型的可传递性方面的影响。
translated by 谷歌翻译
提出了一种使用秘密钥匙访问控制的新方法,以保护模型免受本文未经授权的访问。我们专注于具有视觉变压器(VIT)的语义分割模型,称为分割变压器(SETR)。大多数现有的访问控制方法都集中在图像分类任务上,或者仅限于CNN。通过使用VIT拥有的贴片嵌入结构,可以用秘密键有效地对经过训练的模型和测试图像进行加密,然后在加密的域中执行语义分割任务。在一个实验中,该方法被确认提供了与使用普通图像无需任何加密的授权用户具有正确键的准确性,并为未经授权的用户提供了极度退化的准确性。
translated by 谷歌翻译
众所周知,SNS提供商可以进行上传视频/图像的重新压缩和调整,但是大多数用于检测篡改视频/图像的常规方法对此类操作不够强大。此外,视频是在时间上操作的,例如插入新框架和框架的排列,通过使用常规方法很难检测到其中的操作。因此,在本文中,我们提出了一种新颖的方法,该方法具有强大的散列算法,即使在对视频进行调整和压缩时,也可以检测到时间操作的视频。
translated by 谷歌翻译